library(ggplot2)
library(dplyr)
library(tidyr)
library(reshape2)
library(RColorBrewer)
library(knitr)
library(kableExtra)

In May 2017 I went on a 5 week trip around Europe. Yes, it was nice. Of course it was! I took a lot of photos, so obviously when I got back I wrote some simple code to have a look at my photographic habits inspired by this post. I used exiftool to dump some data into a csv and then I pulled it all into R to see whether my patterns were strong enough to either change how I shoot or change my kit.

Why am I doing this?

I only had one lens on me at the time, and I often found myself in camera shops because I felt my lens was limiting the kind of photos I could take. So I figured I might look at the exif data from the photos I took and see whether I was bumping up against constraints imposed by the kit zoom lens I was using. So the question I’m trying to answer is:

Is my lens holding me back?

Do I have an excuse (uh … reason) to buy a new one?

What am I looking for?

The key features in a lens are the focal length and the maximum size of the aperture (the opening that lets in the light), so I’ll focus on those. I was using a kit zoom lens with my Sony a6000 which can shoot with a focal length of 16-50mm and an aperture between f3.5 and f5.6 depending on how much you have zoomed in (see the chart further down).

I like taking pictures of architecture, and I also like shooting at night so I tend to want a bigger aperture (smaller f number) and wider frame (short focal length). But whether my impression is correct or not will be revealed by the data.


One of my masterpieces
A very nervous looking lion as seen out the front of the Pergamon Museum in Berlin

Note: I use a fair bit of photography jargon here. Aperture is particularly counterintuitive because a large opening, letting a lot of light onto the sensor, has a small f-stop value. See this quora question for more detail.

Get the data out of exiftool

The first thing to do is to get the exif data. Exiftool has lots and lots of options for getting specific data out which can give you the exact data you’re after, so I requested a few variables relating to aperture and focal length (see the code snippet). I used the exiftool command prompt to put the exif data into a csv, and then read it into R.

PS: I only found out later that there’s an R package called exifr that would have made this a bit easier. You can basically do what my code does except go straight from exiftool to a dataframe, rather than via a CSV. Who would say no to that?! If I repeat this with another batch of snaps I’ll definitely use exifr.

system2(command = 'exiftool', 
        args = c('-csv', '-r',
                 '-DateTimeOriginal', '-Model',
                 '-MaxApertureValue', '-Aperture', '-FocalLength',
                 paste0('"/Volumes/SO SPACE!/Photos/17-05"*')),
        stdout = './inputs/my_photos.csv')
exif_raw <- read.csv('./inputs/my_photos.csv', stringsAsFactors = F)

Clean

To begin with I’ll remove any image entries that don’t have a complete set of the data I’m looking at. But for peace of mind we’ll take a look at what was removed.

kable(exif_raw[!complete.cases(exif_raw),]) %>% 
  kable_styling(bootstrap_options = c("bordered") , full_width = F)
SourceFile DateTimeOriginal Model MaxApertureValue Aperture FocalLength
1466 /Volumes/SO SPACE!/Photos/17-05-21 Vienna/MAH05953.MP4 NA NA

It looks like the only incomplete data point we picked up in the initial batch of data was a movie file (.mp4), and I’m not particularly interested in that so I’m happy to let it go.

First Glance

Seasoned photographers will recognise here that there’s an inherent tension between aperture and focal length. I’ll spare you the physics and give you a chart instead:

The squares that sit at the bottom of the chart mark the minimum f-stop (which corresponds to the largest aperture size) at a given focal length. What you can see is that the fstop increases slowly as you go to a higher focal length.

There are a few things raised by this chart:

  1. I got to try an Olympus Pen with a 17mm f1.8 prime lens on a tour I did in Frankfurt, and it demonstrates pretty clearly what a new piece of kit can do to expand your creative scope. 1
  2. You can see that I took a fair amount of photos at the lower apertures, but the colour for the dots at 16mm and 50mm is the darkest, and not just at lower apertures.

We can tell that the main game here is focal length, and that aperture is less important (at least for this batch). To get a more accurate picture of exactly how most of my photos are taken I created the same plot but just for the Sony with kit lens. The colouring on the heat map gives a fair bit more resolution on the photo density at each combination of focal length and aperture.

exif_fl_ap <- fl_aperture %>% 
  filter(Source == "Photo",
         Model == "A6000 (+Sony 16-50mm f3.5-5.6)") %>% 
  mutate(FocalLength_bin = round(FocalLength),
         Aperture_bin = Aperture) %>% 
  dcast(FocalLength_bin~Aperture_bin, value.var = "Aperture") %>% 
  melt(id = c("FocalLength_bin"), 
       variable.name = "Aperture_bin",
       value.name = "Count")
Aggregation function missing: defaulting to length

Well if I was looking for an indication of my constraints I think I found it. The vast majority of my photos seem to be taken at 16mm on the kit lens:

The relative proportions here are almost rediculous, so I guess I have my answer 😎

Some final remarks

There’s a lot more to dig into here. For one, I deliberately keep my aperture in the 5.6 - 8 range because that’s where I get the best image quality, so seeing a large number of shots at 3.5 still raises my eyebrow slightly. Maybe I should get a wide angle lens with a large aperture? Sounds expensive …

On the data front though, there’s a lot more on offer in exiftool. I haven’t looked at how I could better use ISO or how my photography style has changed over time, next time …


  1. I have corrected both the aperture and focal length calculations for the effect of sensor size to get an apples to apples comparison between the Sony and the Olympus. In this case the micro 4/3 numbers were multiplied by 2/1.53 to convert them to an APS-C equivalent number.↩

---
title: "Analysing my photos with R and Exiftool"
output:
  html_notebook:
    code_folding: hide
  html_document: default
---

```{r Load Libraries, echo=TRUE}
library(ggplot2)
library(dplyr)
library(tidyr)
library(reshape2)
library(RColorBrewer)
library(knitr)
library(kableExtra)
```


In May 2017 I went on a 5 week trip around Europe. Yes, it was nice. Of course it was! I took a lot of photos, so obviously when I got back I wrote some simple code to have a look at my photographic habits inspired by [this post](https://timelyportfolio.github.io/rCharts_catcorrjs/exif/). I used [exiftool](https://www.sno.phy.queensu.ca/~phil/exiftool/) to dump some data into a csv and then I pulled it all into R to see whether my patterns were strong enough to either change how I shoot or change my kit.

### Why am I doing this?

I only had one lens on me at the time, and I often found myself in camera shops because I felt my lens was limiting the kind of photos I could take. So I figured I might look at the exif data from the photos I took and see whether I was bumping up against constraints imposed by the kit zoom lens I was using. So the question I'm trying to answer is:

> Is my lens holding me back? 
>
> Do I have an excuse (uh ... reason) to buy a new one?

### What am I looking for?
The key features in a lens are the focal length and the maximum size of the aperture (the opening that lets in the light), so I'll focus on those. I was using a kit zoom lens with my Sony a6000 which can shoot with a focal length of 16-50mm and an aperture between f3.5 and f5.6 depending on how much you have zoomed in (see the chart further down).

I like taking pictures of architecture, and I also like shooting at night so I tend to want a bigger aperture (smaller f number) and wider frame (short focal length). But whether my impression is correct or not will be revealed by the data.

<center>
<br>
<img src="./ref/NervousLion.png" alt="One of my masterpieces" width="750px" />
<br>
*A very nervous looking lion as seen out the front of the Pergamon Museum in Berlin*
<br><br>
</center>

*Note:* I use a fair bit of photography jargon here. Aperture is particularly counterintuitive because a large opening, letting a lot of light onto the sensor, has a small f-stop value. See [this quora question](http://bit.do/etvj9) for more detail.


### Get the data out of *exiftool*

The first thing to do is to get the exif data. Exiftool has lots and lots of [options for getting specific data out](https://www.sno.phy.queensu.ca/~phil/exiftool/exiftool_pod.html#READING-EXAMPLES) which can give you the exact data you're after, so I requested a few variables relating to aperture and focal length (see the code snippet). 
I used the exiftool command prompt to put the exif data into a csv, and then read it into R.

*PS:* I only found out later that there's an R package called [`exifr`](https://github.com/paleolimbot/exifr) that would have made this a bit easier. You can basically do what my code does except go straight from exiftool to a dataframe, rather than via a CSV. Who would say no to that?! If I repeat this with another batch of snaps I'll definitely use exifr.

```{r Generate Exif Data and Import}
system2(command = 'exiftool', 
        args = c('-csv', '-r',
                 '-DateTimeOriginal', '-Model',
                 '-MaxApertureValue', '-Aperture', '-FocalLength',
                 paste0('"/Volumes/SO SPACE!/Photos/17-05"*')),
        stdout = './inputs/my_photos.csv')
exif_raw <- read.csv('./inputs/my_photos.csv', stringsAsFactors = F)
```

### Clean

To begin with I'll remove any image entries that don't have a complete set of the data I'm looking at. But for peace of mind we'll take a look at what was removed.

```{r echo=TRUE, results='asis'}
exif_clean <- exif_raw[complete.cases(exif_raw),] %>% 
  mutate(FocalLength = as.numeric(substr(FocalLength, 1, regexpr("( m)", FocalLength)-1)))

kable(exif_raw[!complete.cases(exif_raw),]) %>% 
  kable_styling(bootstrap_options = c("bordered") , full_width = F)
```

It looks like the only incomplete data point we picked up in the initial batch of data was a movie file (.mp4), and I'm not particularly interested in that so I'm happy to let it go.

### First Glance
Seasoned photographers will recognise here that there's an inherent tension between aperture and focal length. I'll spare you the physics and give you a chart instead:
```{r Plot Max Aperture Value, echo=TRUE}
fl_aperture_all <- exif_clean %>%
  select(-SourceFile, -DateTimeOriginal) %>%
  mutate(Aperture = ifelse(Model == "ILCE-6000", Aperture, Aperture*2/1.53),
         MaxApertureValue = ifelse(Model == "ILCE-6000", MaxApertureValue, MaxApertureValue*2/1.53),
         FocalLength = ifelse(Model == "ILCE-6000", FocalLength, FocalLength*2/1.53),
         Model = ifelse(Model == "ILCE-6000", "A6000 (+Sony 16-50mm f3.5-5.6)", "Pen-F (+Olympus 17mm f1.8)"))

fl_aperture_act <- fl_aperture_all %>% 
  select(-MaxApertureValue) %>% 
  group_by(Model) %>% 
  mutate(Source = 'Photo')

fl_aperture_max <- fl_aperture_all %>% 
  select(-Aperture) %>% 
  group_by(Model, FocalLength) %>% 
  summarise(Aperture = min(MaxApertureValue)) %>% 
  mutate(Source = 'Max Aperture')

fl_aperture <- rbind(fl_aperture_act, fl_aperture_max)
x_breaks <- seq(16, 50, by = 2)

scatter_plot <- fl_aperture %>% 
  ggplot(aes(x = FocalLength, y = Aperture, colour = Model, alpha = Source, shape = Source)) +
  geom_point(size = 3) +
  scale_x_continuous(breaks = x_breaks) + 
  scale_alpha_manual(values = c(1,0.05)) +
  scale_shape_manual(values = c(0,16)) +
  labs(x = "Focal Length (mm APS-C Equivalent)",
       y = "f-stop/Aperture (smaller number -> more light)") +
  coord_cartesian(xlim = c(16, 50), ylim = c(0, 30))
```

```{r Plot}
scatter_plot
```
The squares that sit at the bottom of the chart mark the minimum f-stop (which corresponds to the largest aperture size) at a given focal length. What you can see is that the fstop increases slowly as you go to a higher focal length. 

There are a few things raised by this chart:

1. I got to try an Olympus Pen with a 17mm f1.8 prime lens on a tour I did in Frankfurt, and it demonstrates pretty clearly what a new piece of kit can do to expand your creative scope. [^1]
2. You can see that I took a fair amount of photos at the lower apertures, but the colour for the dots at 16mm and 50mm is the darkest, and not just at lower apertures.

We can tell that the main game here is focal length, and that aperture is less important (at least for this batch). To get a more accurate picture of exactly how most of my photos are taken I created the same plot but just for the Sony with kit lens. The colouring on the heat map gives a fair bit more resolution on the photo density at each combination of focal length and aperture.

```{r Heatmap}
exif_fl_ap <- fl_aperture %>% 
  filter(Source == "Photo",
         Model == "A6000 (+Sony 16-50mm f3.5-5.6)") %>% 
  mutate(FocalLength_bin = round(FocalLength),
         Aperture_bin = Aperture) %>% 
  dcast(FocalLength_bin~Aperture_bin, value.var = "Aperture") %>% 
  melt(id = c("FocalLength_bin"), 
       variable.name = "Aperture_bin",
       value.name = "Count")

heatmap_fl_ap <- expand.grid(FocalLength = seq(min(exif_fl_ap$FocalLength_bin), 
                                               max(exif_fl_ap$FocalLength_bin),
                                               1),
                             Aperture = unique(exif_fl_ap$Aperture_bin)) %>% 
  left_join(exif_fl_ap, by = c("FocalLength" = "FocalLength_bin", "Aperture" = "Aperture_bin")) %>% 
  mutate(Count = ifelse(is.na(Count), 0, Count))

heatmap_plot <- ggplot(heatmap_fl_ap, aes(x = FocalLength, y = Aperture)) + 
  geom_raster(aes(fill = Count), interpolate = FALSE) +
  scale_x_continuous(breaks = x_breaks) +
  scale_fill_gradientn(colours = rev(brewer.pal(5, "RdYlBu"))) +
  theme(panel.background = element_blank())

```

```{r Plot Heatmap}
heatmap_plot
```

Well if I was looking for an indication of my constraints I think I found it. The *vast* majority of my photos seem to be taken at 16mm on the kit lens:

```{r plot histogram, fig.height=1, fig.width=4}
fl_aperture_act %>%
  filter(Model == 'A6000 (+Sony 16-50mm f3.5-5.6)') %>%
  group_by(FocalLength) %>% 
  ggplot(aes(x = factor(FocalLength))) + 
  geom_bar(aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(labels = scales::percent) +
  ylab('Percent of photos (%)')
```

The relative proportions here are almost rediculous, so I guess I have my answer 😎

### Some final remarks

There's a lot more to dig into here. For one, I deliberately keep my aperture in the 5.6 - 8 range because that's where I get the best image quality, so seeing a large number of shots at 3.5 still raises my eyebrow slightly. Maybe I should get a wide angle lens with a large aperture? Sounds expensive ...

On the data front though, there's a lot more on offer in exiftool. I haven't looked at how I could better use ISO or how my photography style has changed over time, next time ...


[^1]: I have corrected both the aperture and focal length calculations for the effect of sensor size to get an apples to apples comparison between the Sony and the Olympus. In this case the micro 4/3 numbers were multiplied by 2/1.53 to convert them to an APS-C equivalent number.